Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

نویسندگان

  • Jason Naradowsky
  • Kristina Toutanova
چکیده

This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, morphological segmentation) while learning a morpheme segmentation over the target language. Our model outperforms a competitive word alignment system in alignment quality. Used in a monolingual morphological segmentation setting it substantially improves accuracy over previous state-of-the-art models on three Arabic and Hebrew datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised segmentation of hidden semi-Markov non-stationary chains

In the classical hidden Markov chain (HMC) model we have a hidden chain X , which is a Markov one and an observed chain Y . HMC are widely used; however, in some situations they have to be replaced by the more general “hidden semi-Markov chains” (HSMC), which are particular “triplet Markov chains” (TMC) ) , , ( Y U X T = , where the auxiliary chain U models the semi-Markovianity of X . Otherwis...

متن کامل

A Phrase-Based Hidden Semi-Markov Approach to Machine Translation

Statistically estimated phrase-based models promised to further the state-of-the-art, however, several works reported a performance decrease with respect to heuristically estimated phrase-based models. In this work we present a latent variable phrase-based translation model inspired by the hidden semi-Markov models, that does not degrade the system. Experimental results report an improvement ov...

متن کامل

Unsupervised segmentation of randomly switching data hidden with non-Gaussian correlated noise

Hidden Markov chains (HMC) are a very powerful tool in hidden data restoration and are currently used to solve a wide range of problems. However, when these data are not stationary, estimating the parameters, which are required for unsupervised processing, poses a problem. Moreover, taking into account correlated non-Gaussian noise is difficult without model approximations. The aim of this pape...

متن کامل

Segmenting Continuous Motions with Hidden Semi-markov Models and Gaussian Processes

Humans divide perceived continuous information into segments to facilitate recognition. For example, humans can segment speech waves into recognizable morphemes. Analogously, continuous motions are segmented into recognizable unit actions. People can divide continuous information into segments without using explicit segment points. This capacity for unsupervised segmentation is also useful for ...

متن کامل

Unsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context

Unsupervised segmentation of phoneme sequences is an essential process to obtain unknown words during spoken dialogues. In this segmentation, an input phoneme sequence without delimiters is converted into segmented sub-sequences corresponding to words. The Pitman-Yor semi-Markov model (PYSMM) is promising for this problem, but its performance degrades when it is applied to phonemelevel word seg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011